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We implement lattice QCD using the Fortran 90 language. We have designed machine independent modules 
that define fields (gauge, fermions, scalars, etc..) and have defined overloaded operators for all possible operations 
between fields, matrices and numbers. With these modules it is very simple to write QCD programs. We have 
also created a useful compression standard for storing the lattice configurations, a parallel implementation of 
the random generators, an assignment that does not require temporaries, and a machine independent precision 
definition. We have tested our program on parallel and single processor supercomputers obtaining excellent 
performances. 



1. Introduction 

With the twofold goal of facilitating the devel- 
opment of algorithms and applications for lattice 
QCD, and of maintaining good code performance, 
we have taken advantage of the possibilities of- 
fered by Fortran 90 to write a set of modules for 
a high-level, yet efficient implementation of QCD 
simulations. In particular Fortran 90 offers the 
possibility to define both types and overloaded 
operators. These two key features make Fortran 
90 particularly suitable for QCD simulations. 

Our end product is a package, "QCDF90", 
which is fully described in a long documentation 
where we provide all the information needed 
to use our package. In the following we will high- 
light the main characteristics of QCDF90. 

2. Geometry and field definitions 

The set of all lattice sites can be subdivided 
into "even" and "odd" sites according to whether 
the sum of the integer valued coordinates x + y + 
z + t is even or odd (checkerboard subdivision) . 
There are many algorithms which demand, espe- 
cially in the context of a parallel implementation, 
that even and odd variables be treated separately. 
In QCDF90 we have implemented such checker- 
boarded separation of the lattice in two sublat- 
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tices. Correspondingly all field variables are di- 
vided into even and odd variables. The first com- 
ponent of the type definition is an integer vari- 
able parity which will take values and 1 for 
variables defined over even and odd sites respec- 
tively. 

For the gauge variables it is convenient to in- 
clude in the type a single direction /i component 
(of definite parity, of course). The second compo- 
nent of the gauge field type is an integer variable 
dir which takes values from 1 to 4 for variables 
defined in the corresponding direction. 

In a vectorized or superscalar architecture 
pipelined instructions and longer arrays give ori- 
gin to better performance. Therefore the lattice 
is most efficiently indexed by a single lattice vol- 
ume index ranging xyzt from to NX * NY * 
NZ * NT/ 2 - 1 for each sublattices (where Ni is 
the lattice size in direction i). 

We have defined the following field types: 
gauge_f ield (a 3 * 3 complex matrix in a given 
direction); f ermi_f ield (a 3 component complex 
vector times 4 spinor indices); complexjf ield 
(a complex scalar); real_field (a real scalar); 
generator J ield (8 real variables in a given di- 
rection, for the SU (3) generators) . These are de- 
fined on an even or odd sublattice. In addition 
we define the type f ull_gauge Jield (a collec- 
tion of 8 gaugejield) and the type matrix as a 
3*3 complex matrix. 



2 



3. Overloaded operators 

The above type definitions are easily manipu- 
lated once one defines overloaded operators for 
all possible operations between fields, matrices, 
complex and real numbers. The overloaded op- 
erator set includes multiplication, multiplication 
with the adjoint, division, addition, subtrac- 
tion, lambda matrices algebraic manipulations, 
gamma matrices algebraic manipulations, adjoin- 
ing, conjugation, real and complex traces, expo- 
nentiation, square root, contraction, etc... 

For example, if gi are gauge fields, the use of 
overloaded operators allows instructions to be as 
simple as gi = g 2 *gs- 

To implement some useful algebraic opera- 
tions on the SU (3) generators we have over- 
loaded some further operators which perform spe- 
cial operations involving generator Jields and 
gauge_f ields. In particular it is very important 
to have an efficient algorithm for the exponenti- 
ation of a matrix, since this operation can be a 
time consuming component of several QCD cal- 
culations. The algorithm that we have used takes 
advantage of the properties of the 3*3 Hermitian 
traceless matrices to perform the exponentiation 
with a minimal number of arithmetic operations. 

4. Shifts 

Shifts are also implemented as overloaded op- 
erators. For each field types C-shift implements 
an ordinary shift of the field with respect to the 
Cartesian geometry of the lattice. 

Moreover, because gauge theories are charac- 
terized by the property of local gauge invariance, 
we find it very useful to directly define a U-shift 
operator that consists of the shift with the appro- 
priate parallel transport factor. In a gauge theory 
the U-shift is the relevant shift operator and is the 
natural building block for programming. 

In QCD the efficient manipulation of the Dirac 
operator is a very critical issue. Therefore for the 
Fermi fields we have also defined other shift op- 
erators which incorporate fundamental features 
of the Dirac operator. First we define a W-shift 
that combines the shift with the parallel trans- 
port and the appropriate gamma matrix manip- 



ulation, i.e. acting on a Fermi field fi W-shift 
produces a Fermi field f 2 , given by 

/2,x = (1 - 7^/l,x+A (1) 
for positive W-shift, and 

/2,x = (1+7m)^aA.«-A ( 2 ) 

for negative W-shift. The direct definition of this 
combined operator entails advantages of efficiency 
because, from the properties of the gamma ma- 
trices, it follows that only one half of the spin 
components undergo the transport. 

We overload the (Wilson) lattice Dirac opera- 
tor and, finally, we overload the operator Xdirac= 
75 Dirac 75. 

5. Assignments 

The use of overloaded operators may imply the 
creation of more temporaries and, consequently, 
more motion of data than a straightforward im- 
plementation of operations among arrays. Con- 
sider for example the following operation among 
variables of type fermLfield: f 1 = f 1 + f 2 + f3. 
As far as we know, Fortran 90 does not specify 
how the variables should be passed in function 
calls. As a consequence, the above instruction 
requires as many as four temporaries. The proce- 
dure could be drastically simplified through the 
use of an overloaded assignment + =. The above 
instruction could be written fl + = f2 + f3 
which the compiler would implement by issuing 
first a call to a function that adds f 2 and f 3 re- 
turning the result in tl. The addresses of f 1 and 
tl would then be passed to a subroutine that 
implements the operation fl = fl+tl among 
the components of the data types. The required 
number of copies to memory would be only two, 
instead of four. 

In order to allow for these possible gains in 
efficiency, we have defined a large set of over- 
loaded assignments. Since Fortran 90 permits 
only the use of the = symbol for the assign- 
ment, we defined two global variables: a charac- 
ter variable assign_type and an integer variable 
assign_spec (for assign specification, introduced 
to accommodate assignments of a more elaborate 
nature). The default values of these variables 
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are "=" and "0". Overloaded assignments are 
obtained by setting assign_type (and possibly 
assign_spec) to the appropriate value immedi- 
ately before the assignment. (Our example be- 
come assign_type =' +'; f 1 = f 2 + f 3). 

In the module "assign_mixed" , assignments are 
also defined between variables of different types. 
For example if Ci is a complex_field and complex 
is a complex variable, the instruction complex = 
Cj, is interpreted as setting the variable complex 
to the sum over all the lattice of the components 

of Ci. 

6. Random numbers 

We have implemented a parallelizable version 
of the unix pseudorandom number generator 
erand48, which also provides added functionality. 
Erand48 is a congruential pseudorandom number 
generator based on the iterative formula 



Si+i = ai * St 



bi mod 2 4 



(3) 



where eti = 0x5DEECE66D, b\ = OxB, Si and s,*+i 
are integers of at least 48 bits of precision. The 
"seeds" Si are converted to real pseudorandom 
numbers with uniform distribution between 
and 1 by r; = 2~ 48 

As presented above, the algorithm is intrin- 
sically serial. However it follows from Eq. (||) 
that the N th iterate Si+N is still of the form 
Si+N — a,N*Si + bN mod 2 8 with integers a at and 
&jv which are uniquely determined by oi, b\. The 
module takes advantage of this fact and of the 
definitions of a global variable seeds to generate 
pseudorandom numbers in a parallelizable fash- 
ion. The module "random_numbers" overloads 
operators that generate Gaussian or uniformly 
distributed fields and all the necessary seed ma- 
nipulations. 

7. Conditionals 

The module "conditional" defines six over- 
loaded relational operators, >, >=, <, < = , ==, 
/ =, and the .Xor. operator. 

The relational operators return a dummy logi- 
cal variable and at the same time the global vari- 
able context is set to .TRUE, at all sites where the 
relation is satisfied, to .FALSE, at all other sites. 



The operator .Xor. admits as operands a pair 
of fields of the same type and returns a field, also 
of the same type, having as elements the corre- 
sponding elements of the first operand at the sites 
where the global variable context is .TRUE., the 
elements of the second operand at the sites where 
context is .FALSE.. This can be used to select 
elements out of two fields according to some local 
condition, an operation which lies at the founda- 
tion of stochastic simulation techniques. 

8. Precision 

To render the precision definitions machine in- 
dependent, the module "precision" defines two 
kind parameters, REAL8 and LONG. These param- 
eters store the kind of an 8-byte floating point 
variable and of an 8-byte integer variable. 

9. Write and read configurations 

To store and retrieve an entire SU(3) gauge fi- 
eld configuration, we developed a portable, com- 
pressed ASCII format. Only the first two columns 
of the gauge field matrices are stored, thanks to 
unitarity and unimodularity. Our subroutines 
takes advantage of the fact that all of the ele- 
ments of the gauge field matrices have magnitude 
smaller or equal to 1 to re-express their real and 
imaginary parts as 48bit integers. These integers 
are then written in base 64, with the digits be- 
ing given by the ASCII collating sequence start- 
ing from character "0" (to avoid unwanted ASCII 
characters). Thus, an entire gauge field matrix 
is represented by 96 ASCII characters, without 
loss of numerical information. A detailed descrip- 
tion of the compressing procedure is written, as 
a header, at the beginning of the stored configu- 
ration file itself. 
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